Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search

نویسنده

  • Rémi Coulom
چکیده

Monte-Carlo evaluation consists in estimating a position by averaging the outcome of several random continuations, and can serve as an evaluation function at the leaves of a min-max tree. This paper presents a new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a MonteCarlo phase. Instead of backing-up the min-max value close to the root, and the average value at some depth, a more general backup operator is defined that progressively changes from averaging to min-max as the number of simulations grows. This approach provides a fine-grained control of the tree growth, at the level of individual simulations, and allows efficient selectivity methods. This algorithm was implemented in a 9× 9 Go-playing program, Crazy Stone, that won the 10th KGS computer-Go tournament.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search

Over the past decade, Monte Carlo Tree Search (MCTS) and specifically Upper Confidence Bound in Trees (UCT) have proven to be quite effective in large probabilistic planning domains. In this paper, we focus on how values are backpropagated in the MCTS tree, and apply complex return strategies from the Reinforcement Learning (RL) literature to MCTS, producing 4 new MCTS variants. We demonstrate ...

متن کامل

Monte-Carlo Tree Search: To MC or to DP?

State-of-the-art Monte-Carlo tree search algorithms can be parametrized with any of the two information updating procedures: MC-backup and DP-backup. The dynamics of these two procedures is very different, and so far, their relative pros and cons have been poorly understood. Formally analyzing the dependency of MCand DP-backups on various MDP parameters, we reveal numerous important issues that...

متن کامل

On MABs and Separation of Concerns in Monte-Carlo Planning for MDPs

Linking online planning for MDPs with their special case of stochastic multi-armed bandit problems, we analyze three state-of-the-art Monte-Carlo tree search algorithms: UCT, BRUE, and MaxUCT. Using the outcome, we (i) introduce two new MCTS algorithms, MaxBRUE, which combines uniform sampling with Bellman backups, and MpaUCT, which combines UCB1 with a novel backup procedure, (ii) analyze them...

متن کامل

Developments on Product Propagation

Product Propagation (PP) is an algorithm to backup probabilistic evaluations for abstract two-player games. It was shown that PP could solve go problems as efficiently as Proof Number Search (PNS). In this paper, we exhibit a few domains where, for generic non-optimized versions, PP performs better than previously known algorithms for solving games. The compared approaches include alpha-beta se...

متن کامل

Paranoia versus Overconfidence in Imperfect- Information Games

In minimax game-tree search, the min part of the minimax backup rule derives from what we will call the paranoid assumption: the assumption that the opponent will always choose a move that minimizes our payoff and maximizes his/her payoff (or our estimate of the payoff, if we cut off the search before reaching the end of the game). A potential criticism of this assumption is that the opponent m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006